Nearest neighbor embedding with different time delays 
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A nearest neighbor based selection of time delays for phase space reconstruction is proposed and 
compared to the standard use of time delayed mutual information. The possibility of using different 
time delays for consecutive dimensions is considered. A case study of numerically generated 
solutions of the Lorenz system is used for illustration. The effect of contamination with various 
levels of additive Gaussian white noise is discussed. 
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Reconstructing the phase space of a dynamical system 
from a time series is a well-known mathematical result 
central to almost all nonlinear time series analysis meth- 
ods (see for a general introduction). It is of paramount 
importance as it ensures that, under certain generic con- 
ditions, such a reconstruction is equivalent to the original 
phase space. This equivalence ensures that differential 
information is preserved and allows for both qualitative 
and quantitative analysis. Consider a smooth determin- 
istic dynamical system s(t) = / (s(io)), either in continu- 
ous or discrete time, whose trajectories are asymptotic to 
a compact d-dimensional manifold A. When performing 
fc-dimensional measurements, where k = 1, . . . , d, a func- 
tion X(i) = h[s(t = ixS)] relates the states of the dynami- 
cal system throughout time and a time series of measured 
points, where € R*, i — 1, . . . , n; n is the total num- 
ber of sampled points, and 5 is the sampling time. As a 
consequence of our ignorance on the system, or of limita- 
tions of the measurement apparatus, or simply because 
it is too costly, d-dimensional measurements are typically 
not made [lj,|6|. In this report we will only address scalar 
measurements, that is, k = 1. Phase space reconstruction 
by time delay embedding is a method of generating an 
m-dimensional manifold that is equivalent to the original 
d-dimensional manifold, by means of a matrix of delay- 
coordinate vectors. Consider a column vector time series 
X(^ . Define an m-dimensional matrix of delay-coordinate 
column vectors by adding together displaced copies of the 
time series, X = [x (l ),X( i+T ), . . . , x (l+ ( m _ 1 ) r) ]. Such ma- 
trix Xj„_( m _ 1 ) XT ln ] is called an embedding matrix, and 
two parameters need to be optimally estimated. The first 
is the time delay r, which quantifies the time displace- 
ment between successive delay-coordinate vectors. The 



* spinto@itqb.unl.pt 



second is the embedding dimension m, which quantifies 
the number of such delay-coordinate vectors. In this re- 
port we only address the estimation of r, by suggest- 
ing a nearest neighbor based procedure and comparing 
it to the standard use of time delayed mutual informa- 
tion. Though in the limit of infinite data and infinite 
precision r may be set to any arbitrary value, a balance 
between relevance and redundancy [l| must be accom- 
plished for real data. When r is too small, the elements 
of the delay-coordinate vectors will mostly be around the 
bisectrix of the phase space and, consequently, the recon- 
struction will not be satisfactory. On the contrary, if r 
is too large the delay-coordinate vectors will become in- 
creasingly uncorrelated, with the consequent loss of abil- 
ity to recover the underlying attractor. In addition, using 
a time delay larger than necessary will render fewer data 
points for the reconstruction. This may be particularly 
limiting for the study of biological systems, where data 
sets are often not long. The most common procedure for 
selecting r is using the first minimum of time delayed mu- 
tual information, as proposed by Fraser and Swinney 0: 
I{x( i) ,X( l+T ),T) = H(x(i))+H(x( i+T ))-H(x(i) ) X{ i+T )) = 

EK^w . s(*fT)) iog 2 pSwK+t!) ' where H ^ is thc 

Shannon entropy Nonetheless its widespread use, 
some drawbacks can be pointed out to this selection cri- 
terion. The first is that probabilities are estimated by 
creating a histogram for the probability distribution of 
the data, which depends on a heuristic choice of num- 
ber of bins, for example, log 2 of the total number of 
points 0- Therefore, / depends on the partitioning. 
The second drawback is that it contains no dynamical 
information, which might be incorporated by consider- 
ing transition rather than static probabilities, but such 
correction is usually not made [7|. The third is that the 
selection criterion presented by Fraser and Swinney 0], 
though generalized to higher dimensions, was established 
for two-dimensional embeddings and does not neces- 
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sarily hold for higher dimensional embeddings, as shown 
below. Finally, a fourth drawback [lj is associated with 
the fact that, when the purpose is solely to maximize 
statistical independence |2j, there is no obvious reason 
to choose the first minimum over others. We propose 
an alternative measure for selecting time delays, based 
on nearest neighbor estimations. This nearest neighbor 
measure is inspired by the false nearest neighbors algo- 
rithm proposed by Kennel et al. 0]. With minimal as- 
sumptions, this measure is based solely on topological 
and dynamical arguments documented by the data. We 
do not address the estimation of to. The embedding the- 
orem proposed by Takens guarantees a solution. It 
states that if a map from the original d-dimensional phase 
space A, to the reconstructed m-dimensional phase space 
is generic, when m > (2d + 1) that map is a diffeomor- 
phism on A, that is, an embedding, or a smooth one-to- 
one map with a smooth inverse. This one-to-one property 
implies that if the system is deterministic, distinct points 
on the attractor A are mapped to distinct points under 
the embedding map 6]. Nevertheless, Takens result is 
only a sufficient condition according to Kennel et al. , 
who propose the use of false nearest neighbors (F) as a 
criteria. Their algorithm considers the ratio of Euclidean 
distances between a point and its nearest neighbor, first 
on a m-dimensional and then on a (to + l)-dimensional 
space. If the ratio is greater than a given threshold, these 
points are referred to as F, that is, points that appear to 
be nearest neighbors not because of the dynamics, but 
because the attractor is being viewed in an embedding 
space too small to unfold it. The procedure is repeated 
for all points in the time series. As the fraction of F as 
a function of the embedding space dimension decreases 
for deterministic systems, when its value is zero, the un- 
derlying attractor is unfolded and to can be optimally 
estimated. 

Considering the problem of optimally selecting time 
delays, we will compare two different approaches. The 
first is a standard procedure and uses the first minimum 
of the time delayed mutual information to set r for all 
dimensions 0. The second, the one we propose, uses the 
first minimum of a nearest neighbor measure to set the 
time delay for each dimension. Therefore, this second 
procedure is iterative and introduces two novelties: us- 
ing a nearest neighbor based measure instead of the time 
delayed mutual information, and using different time de- 
lays for consecutive dimensions, as the standard use of 
the same r value is an assumption out of convenience 
and not imposed by any theoretical argument [I). In 
both cases, the embedding dimension is estimated as the 
fraction of F decreases to zero. The implementation of 
the standard procedure is described below, (i) Consider 
an initial column vector time series Xn\ . For each r being 
tested, t = 1, . . . , jqU, build a temporary embedding ma- 
trix T = [x(j), X(j_|_ T )] out of two column vectors X(j) and 
X( i+T ). The upper limit for r is set arbitrarily, (ii) Esti- 
mate the time delayed mutual information X( i+T )). 
(iii) Select the first minimum from the profile of / vs t, 



which will be the optimal time delay for all dimensions 
(columns) of the final embedding matrix X. (iv) Esti- 
mate the percentage of F (algorithm in Q) as a function 
of the dimensionality of the embedding matrix. The op- 
timal embedding dimension is set when the fraction of F 
drops to 0. As r is the same for all dimensions it will be 
referred to as a fixed time delay, and the final embedding 
matrix will be X = [x w ,x (i+T) , . . . , X( i+ ( m _ X ) T )]. The 
implementation of the proposed algorithm is as follows, 
(i) Consider an initial column vector time series x^) . For 
each r being tested, r = 1, . . . , -^n, build a temporary 
embedding matrix T = [x/j),X( i+T )]. (ii) For each two- 
dimensional point, that is, for each row in matrix T, es- 
timate its (two-dimensional) nearest neighbor. Calculate 
the Euclidean distance between them, cLei- (hi) Con- 
sider both points one sampling unit ahead and calculate 
the new Euclidean distance between them, dE2- (iv) Es- 
timate dE2/d-Ei and save the number of distance ratios 
larger than 10. That fraction will be referred to as N. 
The threshold value, though heuristically set, is justified 
by numerical studies £| and has low parametric sensitiv- 
ity, (v) Select the first minimum from a profile of N vs 
r, which will be the optimal time delay for this first em- 
bedding cycle, T\. We define an embedding cycle as each 
iteration [steps (i) to (vi)] that adds another dimension to 
the embedding matrix, (vi) Estimate the percentage of 
F. Save that value as a function of the dimensionality of 
the temporary embedding matrix T. (vii) Consider now 
matrix X = [x.u) , X( i+ri )] as the starting point for the 
second embedding cycle. For each r being tested, build a 
temporary embedding matrix T = \xu\, X( i+Tl ), X( i+r )]. 
(viii) Repeat steps (ii) to (vii), considering that points 
are now three- and more dimensional, until the fraction 
of F drops to 0. As there will be a vector of r val- 
ues [ti, . . . , T( m _ 1 )], this procedure is said to use differ- 
ent time delays, and the final embedding matrix will be 
X = [X(,),X(i +T1 ),X(j +T2 ), . . . , X( l+T(m 

The Lorenz system [fj of differential equations x = 
a(y — x), y = x(p — z) —y, z = xy — (3z, with parameters 
(t = 10, p = 28, [3 = 8/3 will be used as a case study. 
The equations were numerically integrated with a 4-5th 
order Runge-Kutta algorithm and sampled at 5 = 0.01 
intervals. Transients were removed. We will consider a 
first data set, referred to as L(X), which is the noise-free 
cc-coordinate of Lorenz system. We will also consider 
a second data set, referred to as L(X V ), consisting of 
the noise-free ^-coordinate of the Lorenz system contam- 
inated with additive Gaussian white noise of mean zero 
and variance 0.05, I, 2, 3, or 5. For the major part of 
this report, noise of variance 1 will be used. The other 
variances will be used later to further document the ef- 
fect of noise on the r selecting procedures. Real systems 
may also be contaminated with dynamical noise, though 
we do not address such possibility here. Each data set 
includes a total of 8000 points, (|)th of which is plotted 
in Fig. 1. 

The profiles for selecting r are displayed in Fig. 2 for 
the first embedding cycle, and in Fig. 3 for the second 
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FIG. 1: Data sets: noise-free ^-coordinate of the Lorenz sys- 
tem [L(A")], and contaminated with additive Gaussian white 
noise of mean and variance 1 [L(X V )]. 



embedding cycle. Dashed lines represent I profiles, while 
solid lines represent N profiles. Because the standard 
t selecting procedure uses the same time delay for all 
dimensions, only N profiles are represented in Fig. 3. 
Displayed on the upper panels are the profiles for L(X) 
[Fig. 2(a) and Fig. 3(a)] and L(X n ) of variance I [Fig. 
2(b) and Fig. 3(b)]. On the lower panel, zoomed out 
versions of those same profiles are plotted to document 
behavior beyond dynamic coupling. An arrow indicates 
the global minimum of N for the noise-free scenario [Fig. 
2(a) and (c), Fig. 3(a) and (c), solid line]. That same 
value can still be identified for the first embedding cycle 
of the noisy data set L(X V ), though it is no longer a global 
minimum [Fig. 2(d), solid line]. As explained previously, 
a value for r is selected from the first minimum of the 
I profile for both L(X) and L(X n ) [Fig. 2(a) and (b), 
dashed line]; and a value for t\ is selected from the first 
minimum of the N profile for both L(X) and L(X V ) [Fig. 
2(a) and (b), solid line]. For the second embedding cycle, 
an asterisk indicates ti, that is, the value selected in the 
first embedding cycle, while a circle indicates T2, that is, 
the first minimum of this new N profile [Fig. 3(a)]. 

Three main conclusions can be drawn from examining 
the r selecting profiles from both / and N for the first and 
second embedding cycles. The first is that only N retains 
the inverse relationship with structure disclosure, that is, 
unlike /, N values return to higher levels when the time 
delay is too long for dynamical coupling to be retained 
[Fig. 2(c) and Fig. 3(c), arrow]. This global minimum 
suggests an upper limit for the efficient selection of r, 
beyond which statistical independence reflects dynamic 
decoupling, and provides the strongest argument for the 
use of N over /. The effect of noise will be discussed 
later. The second observation is that the profiles for both 
embedding cycles are strikingly different, indicating that 
reusing the time delay from a previous embedding cycle 
is not an efficient procedure, as the N profile peaks at 
Ti [Fig. 3(a), asterisk]. This peaking, an interesting but 
presently unclear feature, was consistently observed for 
all embedding cycles, and not only for the data sets ana- 
lyzed here but also for other systems, such as the Rossler 
attractor not shown here for space constraints. The 
third conclusion refers to the disruptive effect of additive 
noise, particularly clear in Fig. 3(b). To further docu- 
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FIG. 2: First embedding cycle profiles for r selection from I 
(dashed lines) and N (solid lines). Upper panel: (a) L(X) 
and (b) L(X V ) of variance I. Lower panel: zoomed out (c) 
L(X) and (d) L(X V ) of variance I. An arrow indicates the 
global minimum of N for the noise-free scenario [(a) and (c), 
solid line]. 
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FIG. 3: Second embedding cycle profiles for r selection from 
N. Upper panel: (a) L(X) and (b) L(X V ) of variance f. 
Lower panel: zoomed out (c) L(X) and (d) L(X V ) of variance 
f. An arrow indicates the global minimum of N for the noise- 
free scenario [(a) and (c)]. An asterisk indicates ri, while a 
circle indicates T2 [(a)]. 

ment such effect, profiles from N for the second embed- 
ding cycle and additive Gaussian white noise of different 
variances are displayed in Fig. 4. The noise-free scenario 
[Fig. 4(a), as in Fig. 3(a)] is compared to additive Gaus- 
sian white noise of mean and variance 0.05 [Fig. 4(b)], 
I [Fig. 4(c), as in Fig. 3(b)], 2 [Fig. 4(d)], 3 [Fig. 4(e)], 
and 5 [Fig. 4(f)]. All profiles peak exactly at t±, as had 
been previously observed in Fig. 3. Interestingly, the r 
value that is a global minimum in the noise-free scenario 
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[Fig. 4(a), arrow] is in most cases still identifiable. This 
feature may be a helpful guideline, for the global mini- 
mum, though being a suboptimal choice, sets the upper 
limit for the selection of r values. 
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FIG. 4: Second embedding cycle profiles for t selection from 
TV: (a) L(X) [as in Fig. 3(a)] and L(X V ) contaminated with 
additive Gaussian white noise of mean and variance (b) 
0.05, (c) 1 [as in Fig. 3(b)], (d) 2, (e) 3, and (f) 5. An arrow 
indicates the global minimum of TV for the noise-free scenario 
[as in Fig. 3(a)]. 



The second part of phase space reconstruction implies 
the estimation of the embedding dimension. Fig. 5 doc- 
uments the profiles of F for increasing m values, for the 
noise- free L(X) data set. A dashed line represents the 
conventional use of I and fixed time delays, while a thick 
solid line represents using TV and selecting different time 
delays. The reconstruction using the later is more ef- 
ficient, in the sense that, though both I and TV suggest 
m = 3 as the optimal embedding dimension, the percent- 
age of F when m — 2 is lower for TV. We have argued 
that the global minimum of TV from the noise-free sce- 
nario would be an upper limit for the efficient selection 



of t values. A thin solid line represents selecting the 
global minimum of TV as the r value for all embedding 
cycles, and it is clearly a suboptimal choice. This further 
confirms the relevance of the global minimum as a crite- 
rion for upper-limiting the region where the selection of 
time delays should be made. 

In summary, the nearest neighbor measure we propose, 
unlike mutual information, retains the inverse relation- 
ship with structure disclosure. This is an extremely use- 
ful feature for analyzing noisy time series as it allows for 
the determination of an upper limit to an efficient selec- 
tion of time delays. Another extremely important result 
is that the use of different time delays is more efficient 
than the conventional use of a fixed time delay. 
1 




FIG. 5: Profiles for m selection from F for L(X): selecting a 
fixed t from the first minimum of I (dashed line), different r 
values from the first minimum of TV (thick solid line), and a 
fixed r from the global minimum of TV (thin solid line). 
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